12 research outputs found

    Multimodal One-Shot Learning of Speech and Images

    Full text link
    Imagine a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.Comment: 5 pages, 1 figure, 3 tables; accepted to ICASSP 201

    Antibacterial activity of traditional medicinal plants used by Haudenosaunee peoples of New York State

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The evolution and spread of antibiotic resistance, as well as the evolution of new strains of disease causing agents, is of great concern to the global health community. Our ability to effectively treat disease is dependent on the development of new pharmaceuticals, and one potential source of novel drugs is traditional medicine. This study explores the antibacterial properties of plants used in Haudenosaunee traditional medicine. We tested the hypothesis that extracts from Haudenosaunee medicinal plants used to treat symptoms often caused by bacterial infection would show antibacterial properties in laboratory assays, and that these extracts would be more effective against moderately virulent bacteria than less virulent bacteria.</p> <p>Methods</p> <p>After identification and harvesting, a total of 57 different aqueous extractions were made from 15 plant species. Nine plant species were used in Haudenosaunee medicines and six plant species, of which three are native to the region and three are introduced, were not used in traditional medicine. Antibacterial activity against mostly avirulent (<it>Escherichia coli, Streptococcus lactis</it>) and moderately virulent (<it>Salmonella typhimurium, Staphylococcus aureus</it>) microbes was inferred through replicate disc diffusion assays; and observed and statistically predicted MIC values were determined through replicate serial dilution assays.</p> <p>Results</p> <p>Although there was not complete concordance between the traditional use of Haudenosaunee medicinal plants and antibacterial activity, our data support the hypothesis that the selection and use of these plants to treat disease was not random. In particular, four plant species exhibited antimicrobial properties as expected (<it>Achillea millefolium, Ipomoea pandurata, Hieracium pilosella</it>, and <it>Solidago canadensis</it>), with particularly strong effectiveness against <it>S. typhimurium</it>. In addition, extractions from two of the introduced species (<it>Hesperis matronalis </it>and <it>Rosa multiflora</it>) were effective against this pathogen.</p> <p>Conclusions</p> <p>Our data suggest that further screening of plants used in traditional Haudenosaunee medicine is warranted, and we put forward several species for further investigation of activity against <it>S. typhimurium </it>(<it>A. millefolium, H. matronalis, I. pandurata, H. pilosella, R. multiflora, S. canadensis</it>).</p

    Multimodal one-shot learning of speech and images

    Get PDF
    Thesis (MEng)--Stellenbosch University, 2020.ENGLISH ABSTRACT: Humans learn to perform tasks such as language understanding and visual perception, remarkably, without any annotations and from limited amounts of weakly supervised co-occurring sensory information. Meanwhile, state-of-the-art machine learning models—which aim to challenge these human learning abilities—require large amounts of labelled training data to enable successful generalisation. Multimodal one-shot learning is an effort towards closing this gap on human intelligence, whereby we propose benchmark tasks for machine learning systems investigating whether they are capable of performing cross-modal matching from limited weakly supervised data. Specifically, we consider spoken word learning with co-occurring visual context in a one-shot setting, where an agent must learn novel concepts (words and object categories from a single joint audio-visual example. In this thesis, we make the following contributions: (i we propose and formalise multimodal one-shot learning of speech and images; (ii we develop two cross-modal matching benchmark datasets for evaluation, the first containing spoken digits paired with handwritten digits, and the second containing complex natural images paired with spoken words; and (iii we investigate a number of models within two frameworks, one extending unimodal models to the multimodal case, and the other learning joint audio-visual models. Finally, we show that jointly modelling spoken words paired with images enables a novel multimodal gradient update within a meta-learning algorithm for fast adaptation to novel concepts. This model outperforms our other approaches on our most difficult benchmark with a cross-modal matching accuracy of 40.3% for 10-way 5-shot learning. Although we show that there is room for significant improvement, the goal of this work is to encourage further development on this challenging task. We hope to achieve this by defining a standard problem setting with tasks which may be used to benchmark other approaches.AFRIKAANSE OPSOMMING: Die mens het die merkwaardige vermoë om taal en visuele konsepte aan te leer sonder geannoteerde afrigdata deur gebruik te maak van swak toesig in die vorm van parallelle sensoriese intree. Intussen benodig die beste getoesigde masjienleermodelle massiewe geannoteerde datastelle om te veralgemeen na nuwe intrees. Multimodale eenskootmasjienleer is ’n poging om die gaping tussen die vermoëns van masjienleermodelle te oorbrug. Hier stel ons ’n aantal standaard toetse voor om te bepaal of nuwe masjienleerstelsels die vermoë het om kruismodale passing uit te voer uit slegs ’n paar voorbeelde met beperkte toesig. Meer spesifiek ondersoek ons hoe gesproke woorde wat met ooreenstemmende visuele konsepte voorkom, saam aangeleer kan word in ’n eenskootopstelling waar ’n masjien nuwe konsepte (woord en objekkategorieë uit ’n enkele gesamentlike oudiovisuele voorbeeld moet aanleer. Ons maak die volgende bydraes: (i ons formaliseer multimodale eenskootmasjienleer uit spraak en beelde; (ii ons ontwikkel twee datastelle wat dien as maatstawwe om kruismodale passing te evalueer: die eerste datastel bestaan uit gesproke syfers met gepaardgaande handgeskrewe syfers en die tweede bestaan uit meer komplekse fotos met geïsoleerde woorde; en (iii ons ondersoek verskeie masjienleermodelle in twee opstellings: een waar enkelmodale modelle uitgebrei word na die multimodale geval en die ander waar oudiovisuele modelle gesamentlik afgerig word. Laastens ondersoek ons die gesamentlike aanleer van gesproke woorde met gepaardgaande visuele konsepte deur gebruik te maak van ’n meta-leer-algoritme. Hierdie model vaar die beste in ons moeilikste toetsomgewing, met ’n kruismodale passingsakkuraatheid van 40.3% vir 10-rigting 5-skoot masjienleer. Ons hoop dat deur hierdie probleem formeel te definieer en standaard toets beskikbaar te stel, ons verdere navorsing in hierdie nuwe en uitdagende veld sal aanmoedig.Master

    Habitat Selection by Aquatic Invertebrates

    No full text
    corecore